Learning Objectives: In this tutorial, you will learn how to use the DemoKin package to analyze family structure and kinship networks, understand the mechanics of time-invariant models, and visualize changes in kinship relations across the life course.
Kinship is a fundamental property of human populations and a key form of social structure. Demographers have long been interested in the interplay between demographic change and family configuration. This has led to the development of sophisticated methodological and conceptual approaches for the study of kinship, some of which are explored in this tutorial.
Kinship analysis can answer a range of important questions:
In this tutorial, we will implement matrix kinship models using the
DemoKin package to calculate kin counts and age
distributions. We begin with the simplest model: a
time-invariant one-sex model. In this model, we assume
that everyone in the population experiences the same mortality and
fertility rates throughout their lives (e.g., the 2015 rates), and we
only trace female kin relationships.
Before starting the workshop, please ensure you complete the following preparatory steps:
# Install basic data analysis packages
install.packages("dplyr") # Data manipulation
install.packages("tidyr") # Data tidying
install.packages("ggplot2") # Data visualization
install.packages("readr") # Data import
install.packages("knitr") # Document generation
install.packages("data.table")# Efficient data handling
install.packages("Matrix") # Matrix operations
# Install DemoKin
# DemoKin is available on CRAN (https://cran.r-project.org/web/packages/DemoKin/index.html),
# but we'll use the development version on GitHub (https://github.com/IvanWilli/DemoKin):
install.packages("remotes")
remotes::install_github("IvanWilli/DemoKin")Let’s begin by loading the necessary packages for our analysis:
library(dplyr) # For data manipulation
library(tidyr) # For restructuring data
library(ggplot2) # For visualization
library(readr) # For reading data
library(knitr) # For document generation
library(DemoKin) # For kinship analysisLoad additional utility functions that we’ve prepared for this tutorial:
The DemoKin package includes Swedish demographic data
from the Human Mortality Database (HMD) and Human Fertility Database
(HFD) as an example dataset. This includes:
You can view all available data in the package with
data(package="DemoKin").
Let’s examine a subset of the Swedish demographic data to understand its structure:
## 1900 1901 1902 1903 1904 1905 1906 1907 1908
## 0 0.91060 0.90673 0.92298 0.91890 0.92357 0.92094 0.92717 0.93134 0.92217
## 1 0.97225 0.97293 0.97528 0.97549 0.97847 0.97844 0.98066 0.98175 0.97928
## 2 0.98525 0.98579 0.98630 0.98835 0.98921 0.98914 0.99050 0.99149 0.99135
## 3 0.98998 0.98947 0.99079 0.99125 0.99226 0.99112 0.99341 0.99351 0.99383
## 4 0.99158 0.99133 0.99231 0.99352 0.99272 0.99300 0.99392 0.99539 0.99526
## 5 0.99310 0.99253 0.99401 0.99388 0.99468 0.99394 0.99542 0.99587 0.99570
## 1909
## 0 0.93524
## 1 0.98415
## 2 0.99200
## 3 0.99429
## 4 0.99560
## 5 0.99624
## 1900 1901 1902 1903 1904 1905 1906 1907 1908
## 19 0.04409 0.04357 0.04742 0.04380 0.04523 0.04415 0.04779 0.04910 0.05205
## 20 0.06776 0.07122 0.06989 0.06792 0.06952 0.06981 0.07187 0.07211 0.07994
## 21 0.09643 0.09931 0.09613 0.09654 0.09546 0.09437 0.09761 0.10108 0.10547
## 22 0.12512 0.12555 0.12526 0.11899 0.12269 0.11923 0.12264 0.12384 0.12738
## 23 0.14631 0.14792 0.14743 0.14237 0.14304 0.14502 0.14433 0.14440 0.14694
## 24 0.16285 0.16847 0.16455 0.16279 0.15931 0.15960 0.16276 0.16271 0.16524
## 1909
## 19 0.05274
## 20 0.07930
## 21 0.10456
## 22 0.12639
## 23 0.14607
## 24 0.16087
For our time-invariant model, we need to extract the demographic rates for a single year. Let’s use 2015 as our reference year:
# Extract vectors for 2015
swe_surv_2015 <- swe_px[,"2015"] # Survival probabilities
swe_asfr_2015 <- swe_asfr[,"2015"] # Fertility ratesLet’s compare the data between different time periods to understand demographic changes. Here we compare values from 1950 and 2000:
## Survival probabilities (px):
## 1950 2000
## 0 0.98237 0.99717
## 1 0.99833 0.99984
## 2 0.99885 0.99986
## 3 0.99904 0.99996
## 4 0.99938 0.99988
## 5 0.99920 0.99992
##
## Fertility rates (asfr):
## 1950 2000
## 0 0 0
## 1 0 0
## 2 0 0
## 3 0 0
## 4 0 0
## 5 0 0
##
## Population counts:
## 1950 2000
## 0 57780 43058
## 1 60451 43599
## 2 61288 44356
## 3 62970 46880
## 4 63089 50383
## 5 62963 55150
Let’s visualize how mortality has changed over time. We’ll plot the probability of dying between ages \(x\) and \(x+1\) (denoted as \(q_x = 1-p_x\)) for different years:
swe_px %>%
as.data.frame() %>%
mutate(age = c(0:100)) %>%
pivot_longer(cols = -c(age), names_to = "year", values_to = "px") %>%
filter(year %in% seq(1950, 2020, 30)) %>%
mutate(qx = 1-px) %>%
ggplot() +
geom_line(aes(x = age, y = qx, col = as.character(year)), linewidth = 1) +
scale_y_log10() +
labs(
title = "Age-specific mortality in Sweden (1950-2020)",
subtitle = "Probability of dying between ages x and x+1",
x = "Age",
y = "Probability of dying (qx, log scale)",
col = "Year"
) +
theme_bw() +
theme(legend.position = "bottom")Interpretation: This graph reveals how mortality has declined dramatically across all age groups from 1950 to 2020. The log scale highlights improvements at all ages, with particularly notable declines in infant and child mortality. The characteristic “bathtub” shape of human mortality is clearly visible: high mortality in infancy, followed by very low mortality through childhood and early adulthood, then a steady exponential increase with age.
Now, let’s examine how fertility patterns have changed over time:
swe_asfr %>%
as.data.frame() %>%
mutate(age = c(0:100)) %>%
pivot_longer(cols = -c(age), names_to = "year", values_to = "fx") %>%
filter(year %in% seq(1950, 2020, 30)) %>%
ggplot() +
geom_line(aes(x = age, y = fx, col = as.character(year)), linewidth = 1) +
labs(
title = "Age-specific fertility in Sweden (1950-2020)",
subtitle = "Fertility rates by age of mother",
x = "Age of mother",
y = "Age-specific fertility rate (fx)",
col = "Year"
) +
theme_bw() +
theme(legend.position = "bottom")Interpretation: This visualization shows how fertility patterns have changed over the decades. The 1950 curve shows earlier childbearing with higher peak fertility rates. By 2020, fertility has shifted to later ages, with lower peak rates but a wider distribution across ages, reflecting the postponement of childbearing in developed countries. We can also observe the declining total fertility rate (the area under each curve).
Finally, let’s look at how the population structure has evolved:
swe_pop %>%
as.data.frame() %>%
mutate(age = c(0:100)) %>%
pivot_longer(-age, names_to = "year", values_to = "pop") %>%
mutate(year = gsub("X", "", year)) %>%
filter(year %in% seq(1950, 2020, 30)) %>%
ggplot() +
geom_line(aes(x = age, y = pop, col = as.character(year)), linewidth = 1) +
labs(
title = "Female population structure in Sweden (1950-2020)",
subtitle = "Population counts by age",
x = "Age",
y = "Population count (thousands)",
col = "Year"
) +
theme_bw() +
theme(legend.position = "bottom")Interpretation: This graph shows how Sweden’s female population structure has changed over time. The 1950 distribution shows the effects of baby booms and war years. By 2020, we see population aging with a more uniform distribution across ages and greater longevity, with significant numbers of women surviving to very old ages.
DemoKin is an R package designed to compute the number
and age distribution of relatives (kin) of a focal individual under
various demographic assumptions. It can analyze both living and deceased
kin, and allows for both time-invariant and time-varying demographic
rates.
kin() FunctionThe main function in the package is DemoKin::kin(),
which implements matrix kinship models to calculate expected kin
counts.
For our first example, we’ll run the simplest model with the following assumptions:
Let’s run the basic kinship model:
The kin() function accepts several important
arguments:
In DemoKin, each type of relative is identified by a
unique code. These codes differ from those used in Caswell (2019). The following table shows the
relationship between these coding systems:
The kin() function returns a list containing two data
frames:
## List of 2
## $ kin_full : tibble [142,814 × 7] (S3: tbl_df/tbl/data.frame)
## ..$ kin : chr [1:142814] "d" "d" "d" "d" ...
## ..$ age_kin : int [1:142814] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ age_focal: int [1:142814] 0 1 2 3 4 5 6 7 8 9 ...
## ..$ living : num [1:142814] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ dead : num [1:142814] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ cohort : logi [1:142814] NA NA NA NA NA NA ...
## ..$ year : logi [1:142814] NA NA NA NA NA NA ...
## $ kin_summary: tibble [1,414 × 10] (S3: tbl_df/tbl/data.frame)
## ..$ age_focal : int [1:1414] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ kin : chr [1:1414] "coa" "cya" "d" "gd" ...
## ..$ year : logi [1:1414] NA NA NA NA NA NA ...
## ..$ cohort : logi [1:1414] NA NA NA NA NA NA ...
## ..$ count_living : num [1:1414] 0.2752 0.0898 0 0 0 ...
## ..$ mean_age : num [1:1414] 8.32 4.05 NaN NaN NaN ...
## ..$ sd_age : num [1:1414] 6.14 3.68 NaN NaN NaN ...
## ..$ count_dead : num [1:1414] 0.0000633 0.000037 0 0 0 ...
## ..$ count_cum_dead: num [1:1414] 0.0000633 0.000037 0 0 0 ...
## ..$ mean_age_lost : num [1:1414] 0 0 NaN NaN NaN 0 0 0 0 NaN ...
kin_full Data FrameThis data frame contains detailed information on expected kin counts by: - Age of the focal individual - Type of kin - Age of kin - Living/dead status
One powerful way to visualize kinship structure is through a network or ‘Keyfitz’ kinship diagram (Keyfitz, Caswell, et al. 2005). Let’s see the expected number of living female relatives for a 65-year-old woman according to our model:
swe_2015$kin_summary %>%
filter(age_focal == 65) %>%
select(kin, count = count_living) %>%
plot_diagram(rounding = 2)Interpretation: This Keyfitz diagram provides a comprehensive view of the kinship network for a 65-year-old woman in Sweden (based on 2015 demographic rates). The diagram shows:
This visualization helps us understand the changing composition of family networks across the life course.
Let’s run the model again, but this time we’ll specify exactly which kin types we want to analyze:
swe_2015 <-
kin(
p = swe_surv_2015,
f = swe_asfr_2015,
output_kin = c("c", "d", "gd", "ggd", "gm", "m", "n", "a", "s"), # Specific kin types
time_invariant = TRUE
)Now, let’s visualize how the expected number of each type of relative changes over the life course:
swe_2015$kin_summary %>%
rename_kin() %>% # Convert kin codes to readable labels
ggplot() +
geom_line(aes(age_focal, count_living), linewidth = 1) +
theme_bw() +
labs(
title = "Expected number of living female relatives over the life course",
subtitle = "Based on Swedish demographic rates from 2015",
x = "Age of focal individual",
y = "Number of living female relatives"
) +
facet_wrap(~kin_label, scales = "free_y") # Use different y-scales for each panelInterpretation: These plots show how different kinship relationships evolve over a person’s lifetime:
Note that we are working in a time-invariant framework. You can think of the results as analogous to life expectancy (i.e., expected years of life for a synthetic cohort experiencing a given set of period mortality rates).
How does the overall family size (and family composition) vary over life for an average woman?
# Calculate total kin count at each age
counts <-
swe_2015$kin_summary %>%
group_by(age_focal) %>%
summarise(count_living = sum(count_living)) %>%
ungroup()
# Plot family composition over the life course
swe_2015$kin_summary %>%
select(age_focal, kin, count_living) %>%
rename_kin() %>%
ggplot(aes(x = age_focal, y = count_living)) +
geom_area(aes(fill = kin_label), color = "black", alpha = 0.8) +
geom_line(data = counts, linewidth = 1.5) +
labs(
title = "Family size and composition over the life course",
subtitle = "Based on Swedish demographic rates from 2015",
x = "Age of focal individual",
y = "Number of living female relatives",
fill = "Kin type"
) +
theme_bw() +
theme(legend.position = "bottom")Interpretation: This stacked area chart reveals fascinating patterns in family size and composition throughout life:
The total family size (black line) shows an interesting U-shape, first declining as older relatives die, then rising again as new generations are born.
Beyond just counting relatives, we’re often interested in their age
distribution. Using the kin_full data frame, we can examine
the age distribution of Focal’s relatives at a specific age.
Let’s visualize the age distribution of relatives when Focal is 65 years old:
swe_2015$kin_full %>%
rename_kin() %>%
filter(age_focal == 65) %>%
ggplot(aes(age_kin, living)) +
geom_line(linewidth = 1) +
geom_vline(xintercept = 65, color = "red", linetype = "dashed") +
labs(
title = "Age distribution of living female relatives when Focal is 65",
subtitle = "Based on Swedish demographic rates from 2015 (red line = Focal's age)",
x = "Age of relative",
y = "Expected number of living relatives"
) +
theme_bw() +
facet_wrap(~kin_label, scales = "free_y")Interpretation: These distributions provide rich information about family age structure:
Understanding age distributions is crucial for estimating care needs, support systems, and intergenerational transfers within families.
In this tutorial, we’ve explored how to use the DemoKin
package to model kinship dynamics in a time-invariant, one-sex
framework. We’ve seen how different demographic patterns affect family
size and composition, and visualized these relationships across the life
course.
Key insights include:
In real-world applications, these models can inform: - Planning for eldercare needs - Understanding support systems for young families - Estimating intergenerational wealth transfers - Forecasting demographic dependency ratios
Advanced extensions to this model could include: - Two-sex models (tracking both male and female relatives) - Time-varying models (accounting for historical demographic change) - Stochastic models (incorporating uncertainty)